## 'data.frame': 1173 obs. of 13 variables:
## $ year : Factor w/ 23 levels "1977","1978",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ violent : num 414 419 413 448 470 ...
## $ murder : num 14.2 13.3 13.2 13.2 11.9 10.6 9.2 9.4 9.8 10.1 ...
## $ robbery : num 96.8 99.1 109.5 132.1 126.5 ...
## $ prisoners : int 83 94 144 141 149 183 215 243 256 267 ...
## $ afam : num 8.38 8.35 8.33 8.41 8.48 ...
## $ cauc : num 55.1 55.1 55.1 54.9 54.9 ...
## $ male : num 18.2 18 17.8 17.7 17.7 ...
## $ population: num 3.78 3.83 3.87 3.9 3.92 ...
## $ income : num 9563 9932 9877 9541 9548 ...
## $ density : num 0.0746 0.0756 0.0762 0.0768 0.0772 ...
## $ state : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ law : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## Loading required package: ggplot2
* The variables murder and robbery have the strongest positive linear relationship with the violent crime rate, with a correlation value of 0.827 and 0.907 + Considering these variables are components of the violent crime rate, we expected there to be high correlation values + These variables are more of a subcategory of violent crimes and therefore aren’t good predictors for a linear model * Other variables that showed a high correlation value were prisoners (0.703), density (0.665), afam (0.57), cauc(-0.573), and income (0.408) + The law variable was also a good indicator because the box plot showed that on average states that have a shall carry law in effect have a lower violent crime rate as well as as a lower murder and robbery rate.
* The Fit Plot shows that the actual values fluctuate fairly significantly from the fitted values(pink reference line) * The Residual plot shows that a majority of the observed values fall within +/-500 of the fitted values + This could be considered a substantial deviation from the fitted values + Next lets examine if a log transformation or box-cox transformation would be useful
* The Log Transformed Fit Plot shows that the actual values fluctuate fairly significantly from the fitted values(blue reference line). * The Transformed Fit Plot vs Residual Plot is easier to read than the previous model (m5) + Y-axis, which shows the .resid is shown in Z-scores with a majority of the data falling between +/- 1 standard deviations from the mean + Moving forward we will use the log transformation of the violent variable to make it easier to interpret the data
****Check on the m5 statement******
## Estimated transformation parameter
## Y1
## 0.6022324
Should we remove this line from showing in the html?
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Although there is not much seperation the Box-Cox Tansformed model seems to fit the best
*The residuals of the Box-Cox Transformed model from the above plots appears to have the most normally distributed histogram
##
## Shapiro-Wilk normality test
##
## data: residuals(m1)
## W = 0.97023, p-value = 8.514e-15
##
## Shapiro-Wilk normality test
##
## data: residuals(m2)
## W = 0.97727, p-value = 1.293e-12
##
## Shapiro-Wilk normality test
##
## data: residuals(mlam)
## W = 0.98664, p-value = 6.98e-09
Based on our analysis we will utilize the box-cox transformed model
## Start: AIC=2239.81
## violent^lam ~ year + murder + robbery + prisoners + afam + cauc +
## male + population + income + density + state + law
##
## Df Sum of Sq RSS AIC
## - cauc 1 0 6872 2237.8
## - density 1 0 6873 2237.9
## - afam 1 6 6879 2238.9
## - population 1 7 6879 2239.0
## - income 1 8 6880 2239.2
## <none> 6872 2239.8
## - law 1 78 6950 2251.0
## - male 1 100 6972 2254.7
## - murder 1 132 7005 2260.2
## - prisoners 1 218 7090 2274.4
## - year 22 1632 8504 2445.7
## - robbery 1 4594 11467 2838.3
## - state 50 33830 40702 4226.3
##
## Step: AIC=2237.81
## violent^lam ~ year + murder + robbery + prisoners + afam + male +
## population + income + density + state + law
##
## Df Sum of Sq RSS AIC
## - density 1 0 6873 2235.9
## - population 1 8 6880 2237.1
## - income 1 8 6881 2237.2
## <none> 6872 2237.8
## - afam 1 16 6888 2238.5
## - law 1 78 6951 2249.1
## - murder 1 133 7006 2258.3
## - male 1 194 7066 2268.4
## - prisoners 1 231 7104 2274.6
## - year 22 1825 8697 2470.0
## - robbery 1 4595 11467 2836.4
## - state 50 33910 40782 4226.6
##
## Step: AIC=2235.88
## violent^lam ~ year + murder + robbery + prisoners + afam + male +
## population + income + state + law
##
## Df Sum of Sq RSS AIC
## - population 1 7 6880 2235.1
## - income 1 9 6882 2235.4
## <none> 6873 2235.9
## - afam 1 23 6896 2237.8
## - law 1 79 6952 2247.3
## - murder 1 136 7009 2256.9
## - male 1 193 7066 2266.4
## - prisoners 1 390 7263 2298.6
## - year 22 1825 8698 2468.1
## - robbery 1 4594 11467 2834.4
## - state 50 43238 50111 4466.2
##
## Step: AIC=2235.13
## violent^lam ~ year + murder + robbery + prisoners + afam + male +
## income + state + law
##
## Df Sum of Sq RSS AIC
## - income 1 9 6889 2234.6
## <none> 6880 2235.1
## - afam 1 18 6898 2236.2
## - law 1 78 6958 2246.4
## - murder 1 130 7010 2255.0
## - male 1 199 7080 2266.6
## - prisoners 1 452 7332 2307.8
## - year 22 1849 8729 2470.3
## - robbery 1 4637 11517 2837.4
## - state 50 47934 54814 4569.5
##
## Step: AIC=2234.63
## violent^lam ~ year + murder + robbery + prisoners + afam + male +
## state + law
##
## Df Sum of Sq RSS AIC
## <none> 6889 2234.6
## - afam 1 15 6904 2235.1
## - law 1 89 6978 2247.7
## - murder 1 172 7061 2261.5
## - male 1 206 7095 2267.2
## - prisoners 1 455 7344 2307.7
## - year 22 1844 8733 2468.9
## - robbery 1 4688 11577 2841.6
## - state 50 48242 55131 4574.2
From the stepwise regression we see that cauc, density, and income both should be removed from the data to create a globally optimal model
## Calls:
## 1: lm(formula = violent^lam ~ ., data = Guns)
## 2: lm(formula = violent^lam ~ year + murder + robbery + prisoners +
## afam + male + state + law, data = Guns)
##
## Model 1 Model 2
## (Intercept) 11.3 11.6
## year1978 0.886 0.955
## year1979 2.32 2.39
## year1980 2.65 2.67
## year1981 2.41 2.45
## year1982 2.57 2.60
## year1983 2.32 2.39
## year1984 3.26 3.42
## year1985 4.14 4.35
## year1986 5.33 5.58
## year1987 5.46 5.75
## year1988 6.41 6.74
## year1989 6.99 7.36
## year1990 8.93 9.30
## year1991 9.55 9.90
## year1992 10.3 10.7
## year1993 10.7 11.1
## year1994 10.3 10.8
## year1995 9.96 10.41
## year1996 8.94 9.42
## year1997 8.90 9.43
## year1998 8.08 8.69
## year1999 6.98 7.63
## murder 0.152 0.161
## robbery 0.0499 0.0496
## prisoners 0.00993 0.01060
## afam -0.419 -0.337
## cauc -0.00345
## male 1.12 1.15
## population 0.148
## income 0.000135
## density -0.354
## stateAlaska 1.21 1.56
## stateArizona -0.53089 -0.00311
## stateArkansas -3.57 -3.56
## stateCalifornia -0.905 3.402
## stateColorado -4.31 -3.37
## stateConnecticut -9.31 -8.10
## stateDelaware -0.431 -0.277
## stateDistrict of Columbia 3.678 -0.974
## stateFlorida 9.82 11.71
## stateGeorgia -2.05 -1.61
## stateHawaii -7.67 -8.70
## stateIdaho -11.4 -11.1
## stateIllinois 1.08 3.01
## stateIndiana -7.29 -6.28
## stateIowa -12.8 -12.0
## stateKansas -7.50 -6.87
## stateKentucky -10.5 -10.1
## stateLouisiana 4.42 4.26
## stateMaine -17.1 -16.5
## stateMaryland 3.05 3.60
## stateMassachusetts 0.765 2.131
## stateMichigan -0.824 0.603
## stateMinnesota -13.5 -12.3
## stateMississippi -7.39 -8.02
## stateMissouri -2.36 -1.51
## stateMontana -15.6 -15.4
## stateNebraska -10.19 -9.57
## stateNevada -2.67 -2.23
## stateNew Hampshire -19.2 -18.3
## stateNew Jersey -5.28 -3.92
## stateNew Mexico 8.06 8.09
## stateNew York -3.505 -0.626
## stateNorth Carolina -1.45 -0.91
## stateNorth Dakota -25.0 -24.7
## stateOhio -9.26 -7.54
## stateOklahoma -3.22 -2.97
## stateOregon -3.06 -2.25
## statePennsylvania -10.26 -8.19
## stateRhode Island -7.29 -6.98
## stateSouth Carolina 10.4 10.1
## stateSouth Dakota -17.0 -16.7
## stateTennessee -1.484 -0.917
## stateTexas -5.87 -3.51
## stateUtah -13.1 -12.7
## stateVermont -18.9 -18.4
## stateVirginia -12.1 -11.3
## stateWashington -4.34 -3.20
## stateWest Virginia -15.5 -15.1
## stateWisconsin -16.9 -15.8
## stateWyoming -10.7 -10.3
## lawyes -1.09 -1.14
As you can see the larger model model mlam has larger standard errors then the smaller model m3, which has gone through the stepwise regression
## Analysis of Variance Table
##
## Model 1: violent^lam ~ year + murder + robbery + prisoners + afam + male +
## state + law
## Model 2: violent^lam ~ year + murder + robbery + prisoners + afam + cauc +
## male + population + income + density + state + law
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1094 6889.0
## 2 1090 6872.4 4 16.557 0.6565 0.6224
The results produced by the anova function show a non-significant result (p-value = 0.6224). Therefore we should reject the larger model (mlam) and move forward with the smaller model (m3).
To be completed by JH and added potentially
## StudRes Hat CookD
## 185 0.4156464 0.25978614 0.000731069
## 189 -5.0794642 0.25204347 0.102420286
## 195 3.9512822 0.13445287 0.028833267
## 207 0.4733689 0.29029093 0.001105054
## 1127 4.8407145 0.06827768 0.020271504
* After looking at the scaled scatterplot of that data we determine that cauc needs to be transformed
## Start: AIC=9240.41
## violent ~ year + murder + robbery + prisoners + afam + log(cauc) +
## male + population + income + density + state + law
##
## Df Sum of Sq RSS AIC
## - density 1 236 2685757 9238.5
## - log(cauc) 1 335 2685857 9238.6
## <none> 2685522 9240.4
## - income 1 5124 2690646 9240.6
## - afam 1 6355 2691877 9241.2
## - male 1 18702 2704223 9246.5
## - population 1 24164 2709685 9248.9
## - law 1 26090 2711611 9249.7
## - prisoners 1 177230 2862751 9313.4
## - murder 1 332563 3018084 9375.3
## - year 22 502422 3187943 9397.6
## - robbery 1 3052545 5738066 10129.0
## - state 50 8884729 11570250 10853.6
##
## Step: AIC=9238.51
## violent ~ year + murder + robbery + prisoners + afam + log(cauc) +
## male + population + income + state + law
##
## Df Sum of Sq RSS AIC
## - log(cauc) 1 219 2685976 9236.6
## <none> 2685757 9238.5
## - income 1 5544 2691301 9238.9
## - afam 1 6479 2692237 9239.3
## - male 1 18570 2704327 9244.6
## - population 1 24029 2709786 9247.0
## - law 1 26389 2712146 9248.0
## - prisoners 1 303717 2989474 9362.2
## - murder 1 336617 3022375 9375.0
## - year 22 502244 3188001 9395.6
## - robbery 1 3058367 5744124 10128.2
## - state 50 9307137 11992894 10893.7
##
## Step: AIC=9236.6
## violent ~ year + murder + robbery + prisoners + afam + male +
## population + income + state + law
##
## Df Sum of Sq RSS AIC
## <none> 2685976 9236.6
## - income 1 5838 2691814 9237.2
## - afam 1 16775 2702751 9241.9
## - law 1 26764 2712740 9246.2
## - male 1 27232 2713208 9246.4
## - population 1 27239 2713215 9246.4
## - prisoners 1 303511 2989487 9360.2
## - murder 1 336791 3022767 9373.2
## - year 22 527892 3213868 9403.1
## - robbery 1 3136507 5822483 10142.1
## - state 50 10803349 13489325 11029.7
Interestingly this leads to the stepwise regression only from removing density and cauc from the data set but leaving income.
## Analysis of Variance Table
##
## Model 1: violent^lam ~ year + murder + robbery + prisoners + afam + male +
## state + law
## Model 2: violent^lam ~ year + murder + robbery + prisoners + afam + cauc +
## male + population + income + density + state + law
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1094 6889.0
## 2 1090 6872.4 4 16.557 0.6565 0.6224
## mse.m1 mse.m2 mse.m3 mse.m4 mse.mlam
## 1 2889.625 0.01761853 7.150634 2890.016 7.201197
## 2 2890.438 0.01885826 7.429532 2834.517 7.645189
## 3 2738.429 0.01737850 6.972956 2742.381 6.989694
## 4 2859.606 0.01764531 7.123702 2836.695 7.216332
## 5 2712.441 0.01768070 6.987390 2706.323 7.057505
## 6 2919.177 0.01747342 7.073046 2921.903 7.136046
## 7 2933.971 0.01829054 7.059834 2922.766 7.310023
## 8 2712.441 0.01768070 6.987390 2706.323 7.057505
## 9 2919.177 0.01747342 7.073046 2921.903 7.136046
## 10 2933.971 0.01829054 7.059834 2922.766 7.310023
## Analysis of Variance Table
##
## Model 1: violent^lam ~ year + murder + robbery + prisoners + afam + male +
## state + law
## Model 2: violent^lam ~ year + murder + robbery + prisoners + afam + cauc +
## male + population + income + density + state + law
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1094 6889.0
## 2 1090 6872.4 4 16.557 0.6565 0.6224
```